import numpy as np
import pandas as pd
import random
from matplotlib import pyplot as plt
import seaborn as sns

Table 1 downloading and refinement¶

The Ras_SOCS.xlsx file has been retrived from UniProtKB via the advanced search construction:
database:(type:pfam ras) database:(type:pfam socs_box)

All the proteins found belong to diverse bilaterians.

table1 = pd.read_excel("./Ras_SOCS.xlsx")

table1.head()

table1.shape

(1272, 10)

table1.isna().any()

Entry                         False
Entry name                    False
Protein names                 False
Length                        False
Cross-reference (Pfam)        False
Organism                      False
Taxonomic lineage (PHYLUM)    False
Taxonomic lineage (CLASS)      True
Taxonomic lineage (FAMILY)     True
Taxonomic lineage (GENUS)     False
dtype: bool

table1[table1["Taxonomic lineage (FAMILY)"].isna()]

A few entries with unidentified families in the large class Aves are safe to be ignored, they will simply not be accounted for during across-families sampling.

table1[table1["Taxonomic lineage (CLASS)"].isna()]

It appeares that all the entries with unidentified class names are either crocodiles, or turtles. Then they can be united under Archosauromorpha name:

table1.loc[table1[table1["Taxonomic lineage (CLASS)"].isna()].index, "Taxonomic lineage (CLASS)"] = "Archosauromorpha"
table1["Taxonomic lineage (CLASS)"].unique()

array(['Mammalia', 'Insecta', 'Aves', 'Actinopteri',
       'Lepidosauria (lepidosaurs)', 'Archosauromorpha', 'Chondrichthyes',
       'Amphibia', 'Arachnida', 'Clitellata', 'Coelacanthimorpha',
       'Hyperoartia', 'Branchiopoda', 'Lingulata', 'Chromadorea',
       'Malacostraca', 'Gastropoda', 'Polychaeta (polychaetes)',
       'Bivalvia', 'Collembola (springtails)', 'Echinoidea (sea urchins)',
       'Cephalopoda (cephalopods)', 'Chilopoda (centipedes)',
       'Hexanauplia'], dtype=object)

table1[table1["Cross-reference (Pfam)"] != "PF00071;PF07525;"].shape[0]

9

The advanced search construction used to retrieve the table1 sequences, in fact, finds all the proteins having the 2 queried domains, but doesn't pay attention to the presence of any other domains in the found proteins. The extra domains may be mixed up with the 2 target domains in the proteins found. Moreover, in some of the found two-domain proteins the order of domains may be inverted. However, a multidomain protein may be homologous to the target two-domain proteins if the extra domain was added to N- or C-term of the two-domain architecture during a recent evolutinary event.
Thus it's hard to discern non-homologues in the table. Since there are only 9 extra domain proteins in the table and some of them may be homologous to the target architecture, it has been decided to retain them, as it could make no great bias.

Protein length histogram¶

plt.figure(figsize=[9, 7])
a = plt.hist(table1["Length"], bins=25, density=False)
plt.xticks(a[1], rotation=45, size=12)
plt.yticks(size=12)
plt.xlabel("Protein length", size=14)
plt.ylabel("Entries per bin", size=14)
plt.grid(True, axis="y")
plt.title("Protein lenght histogram", size=16)

plt.show(a)

Sampling proteins with typical length¶

Based on the histogram above, proteins with the length range from 200 to 330 aa are further considered as of typical length:

typical = table1[(table1["Length"] > 200) & (table1["Length"] < 330)]

typical["Taxonomic lineage (PHYLUM)"].nunique(),\
typical["Taxonomic lineage (CLASS)"].nunique(), typical["Taxonomic lineage (FAMILY)"].nunique()

(7, 23, 344)

typical["Taxonomic lineage (CLASS)"].value_counts()

Aves                          322
Mammalia                      296
Actinopteri                   270
Insecta                       166
Lepidosauria (lepidosaurs)     20
Arachnida                      17
Archosauromorpha               16
Amphibia                       10
Chromadorea                     9
Chondrichthyes                  7
Malacostraca                    3
Gastropoda                      3
Polychaeta (polychaetes)        2
Hyperoartia                     2
Cephalopoda (cephalopods)       2
Bivalvia                        2
Collembola (springtails)        2
Lingulata                       1
Hexanauplia                     1
Echinoidea (sea urchins)        1
Branchiopoda                    1
Coelacanthimorpha               1
Chilopoda (centipedes)          1
Name: Taxonomic lineage (CLASS), dtype: int64

A small sample of proteins of typical length for HMM profile building is being collected with special attention to taxonomic diversity:

sampled_AC = []

random.seed(7)
for cl in typical["Taxonomic lineage (CLASS)"].unique():
    cl_table = typical[typical["Taxonomic lineage (CLASS)"] == cl]
    if cl_table.shape[0] <= 3:
        sampled_AC += list(cl_table["Entry"])
    else:
        if cl_table["Taxonomic lineage (FAMILY)"].nunique() < 3:
            entries = list(cl_table["Entry"])
            random.shuffle(entries)
            sampled_AC += entries[:3]
        else:
            families = list(cl_table["Taxonomic lineage (FAMILY)"].unique())
            random.shuffle(families)
            families = families[:3]
            for fam in families:
                entries = list(cl_table[cl_table["Taxonomic lineage (FAMILY)"] == fam]["Entry"])
                sampled_AC.append(random.choice(entries))
random.seed()

del typical
        
len(sampled_AC)

52

table1["Sampled"] = table1["Entry"].apply(lambda x: x in sampled_AC)

table1.to_excel("tables.xlsx", sheet_name="table_1", index=False)

Retrieving sampled sequences from UniProt:

for AC in sampled_AC:
    !wsl wget https://www.uniprot.org/uniprot/{AC}.fasta
!wsl cat ./*.fasta > sample.fa
!wsl rm ./*.fasta

--2021-09-25 12:59:41--  https://www.uniprot.org/uniprot/A0A5F4D234.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 393 [text/plain]
Saving to: вЂ�A0A5F4D234.fastaвЂ™

     0K                                                       100% 14.7M=0s

2021-09-25 12:59:44 (14.7 MB/s) - вЂ�A0A5F4D234.fastaвЂ™ saved [393/393]

--2021-09-25 12:59:44--  https://www.uniprot.org/uniprot/A0A2U3W4Z5.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 397 [text/plain]
Saving to: вЂ�A0A2U3W4Z5.fastaвЂ™

     0K                                                       100% 14.8M=0s

2021-09-25 12:59:44 (14.8 MB/s) - вЂ�A0A2U3W4Z5.fastaвЂ™ saved [397/397]

--2021-09-25 12:59:44--  https://www.uniprot.org/uniprot/A0A3Q0CPM7.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 397 [text/plain]
Saving to: вЂ�A0A3Q0CPM7.fastaвЂ™

     0K                                                       100% 15.3M=0s

2021-09-25 12:59:44 (15.3 MB/s) - вЂ�A0A3Q0CPM7.fastaвЂ™ saved [397/397]

--2021-09-25 12:59:45--  https://www.uniprot.org/uniprot/A0A672UJX4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 404 [text/plain]
Saving to: вЂ�A0A672UJX4.fastaвЂ™

     0K                                                       100% 24.5M=0s

2021-09-25 12:59:45 (24.5 MB/s) - вЂ�A0A672UJX4.fastaвЂ™ saved [404/404]

--2021-09-25 12:59:45--  https://www.uniprot.org/uniprot/U3JQ29.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 397 [text/plain]
Saving to: вЂ�U3JQ29.fastaвЂ™

     0K                                                       100% 25.8M=0s

2021-09-25 12:59:45 (25.8 MB/s) - вЂ�U3JQ29.fastaвЂ™ saved [397/397]

--2021-09-25 12:59:45--  https://www.uniprot.org/uniprot/A0A7L0ZRZ1.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 352 [text/plain]
Saving to: вЂ�A0A7L0ZRZ1.fastaвЂ™

     0K                                                       100% 20.7M=0s

2021-09-25 12:59:46 (20.7 MB/s) - вЂ�A0A7L0ZRZ1.fastaвЂ™ saved [352/352]

--2021-09-25 12:59:46--  https://www.uniprot.org/uniprot/A0A0C9PX33.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 398 [text/plain]
Saving to: вЂ�A0A0C9PX33.fastaвЂ™

     0K                                                       100% 29.4M=0s

2021-09-25 12:59:46 (29.4 MB/s) - вЂ�A0A0C9PX33.fastaвЂ™ saved [398/398]

--2021-09-25 12:59:46--  https://www.uniprot.org/uniprot/A0A3S2LIX9.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 419 [text/plain]
Saving to: вЂ�A0A3S2LIX9.fastaвЂ™

     0K                                                       100% 29.2M=0s

2021-09-25 12:59:47 (29.2 MB/s) - вЂ�A0A3S2LIX9.fastaвЂ™ saved [419/419]

--2021-09-25 12:59:47--  https://www.uniprot.org/uniprot/V5G8Z0.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 385 [text/plain]
Saving to: вЂ�V5G8Z0.fastaвЂ™

     0K                                                       100% 13.9M=0s

2021-09-25 12:59:47 (13.9 MB/s) - вЂ�V5G8Z0.fastaвЂ™ saved [385/385]

--2021-09-25 12:59:47--  https://www.uniprot.org/uniprot/A0A672GNZ1.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 422 [text/plain]
Saving to: вЂ�A0A672GNZ1.fastaвЂ™

     0K                                                       100% 29.8M=0s

2021-09-25 12:59:48 (29.8 MB/s) - вЂ�A0A672GNZ1.fastaвЂ™ saved [422/422]

--2021-09-25 12:59:48--  https://www.uniprot.org/uniprot/A0A5J5DP43.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 444 [text/plain]
Saving to: вЂ�A0A5J5DP43.fastaвЂ™

     0K                                                       100% 15.9M=0s

2021-09-25 12:59:48 (15.9 MB/s) - вЂ�A0A5J5DP43.fastaвЂ™ saved [444/444]

--2021-09-25 12:59:48--  https://www.uniprot.org/uniprot/A0A668SY01.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 398 [text/plain]
Saving to: вЂ�A0A668SY01.fastaвЂ™

     0K                                                       100% 12.5M=0s

2021-09-25 12:59:49 (12.5 MB/s) - вЂ�A0A668SY01.fastaвЂ™ saved [398/398]

--2021-09-25 12:59:49--  https://www.uniprot.org/uniprot/G1KTT8.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 413 [text/plain]
Saving to: вЂ�G1KTT8.fastaвЂ™

     0K                                                       100% 16.1M=0s

2021-09-25 12:59:49 (16.1 MB/s) - вЂ�G1KTT8.fastaвЂ™ saved [413/413]

--2021-09-25 12:59:49--  https://www.uniprot.org/uniprot/A0A6J0V5R7.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 400 [text/plain]
Saving to: вЂ�A0A6J0V5R7.fastaвЂ™

     0K                                                       100% 14.8M=0s

2021-09-25 12:59:50 (14.8 MB/s) - вЂ�A0A6J0V5R7.fastaвЂ™ saved [400/400]

--2021-09-25 12:59:50--  https://www.uniprot.org/uniprot/A0A402FFE6.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 443 [text/plain]
Saving to: вЂ�A0A402FFE6.fastaвЂ™

     0K                                                       100% 14.7M=0s

2021-09-25 12:59:50 (14.7 MB/s) - вЂ�A0A402FFE6.fastaвЂ™ saved [443/443]

--2021-09-25 12:59:50--  https://www.uniprot.org/uniprot/A0A4D9E0G6.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 416 [text/plain]
Saving to: вЂ�A0A4D9E0G6.fastaвЂ™

     0K                                                       100% 14.5M=0s

2021-09-25 12:59:50 (14.5 MB/s) - вЂ�A0A4D9E0G6.fastaвЂ™ saved [416/416]

--2021-09-25 12:59:51--  https://www.uniprot.org/uniprot/A0A7M4E1H1.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 407 [text/plain]
Saving to: вЂ�A0A7M4E1H1.fastaвЂ™

     0K                                                       100% 29.2M=0s

2021-09-25 12:59:51 (29.2 MB/s) - вЂ�A0A7M4E1H1.fastaвЂ™ saved [407/407]

--2021-09-25 12:59:51--  https://www.uniprot.org/uniprot/A0A674I0V9.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 410 [text/plain]
Saving to: вЂ�A0A674I0V9.fastaвЂ™

     0K                                                       100% 18.4M=0s

2021-09-25 12:59:51 (18.4 MB/s) - вЂ�A0A674I0V9.fastaвЂ™ saved [410/410]

--2021-09-25 12:59:51--  https://www.uniprot.org/uniprot/A0A401S8S3.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 456 [text/plain]
Saving to: вЂ�A0A401S8S3.fastaвЂ™

     0K                                                       100% 14.7M=0s

2021-09-25 12:59:52 (14.7 MB/s) - вЂ�A0A401S8S3.fastaвЂ™ saved [456/456]

--2021-09-25 12:59:52--  https://www.uniprot.org/uniprot/A0A401NXM8.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 436 [text/plain]
Saving to: вЂ�A0A401NXM8.fastaвЂ™

     0K                                                       100% 18.2M=0s

2021-09-25 12:59:52 (18.2 MB/s) - вЂ�A0A401NXM8.fastaвЂ™ saved [436/436]

--2021-09-25 12:59:52--  https://www.uniprot.org/uniprot/A0A4W3I2U4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 393 [text/plain]
Saving to: вЂ�A0A4W3I2U4.fastaвЂ™

     0K                                                       100% 16.6M=0s

2021-09-25 12:59:52 (16.6 MB/s) - вЂ�A0A4W3I2U4.fastaвЂ™ saved [393/393]

--2021-09-25 12:59:53--  https://www.uniprot.org/uniprot/A0A4Y2EGP2.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 391 [text/plain]
Saving to: вЂ�A0A4Y2EGP2.fastaвЂ™

     0K                                                       100% 23.3M=0s

2021-09-25 12:59:53 (23.3 MB/s) - вЂ�A0A4Y2EGP2.fastaвЂ™ saved [391/391]

--2021-09-25 12:59:53--  https://www.uniprot.org/uniprot/A0A2L2Y2R6.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 387 [text/plain]
Saving to: вЂ�A0A2L2Y2R6.fastaвЂ™

     0K                                                       100% 24.8M=0s

2021-09-25 12:59:53 (24.8 MB/s) - вЂ�A0A2L2Y2R6.fastaвЂ™ saved [387/387]

--2021-09-25 12:59:53--  https://www.uniprot.org/uniprot/A0A1E1XK58.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 429 [text/plain]
Saving to: вЂ�A0A1E1XK58.fastaвЂ™

     0K                                                       100% 24.6M=0s

2021-09-25 12:59:54 (24.6 MB/s) - вЂ�A0A1E1XK58.fastaвЂ™ saved [429/429]

--2021-09-25 12:59:54--  https://www.uniprot.org/uniprot/A0A6P8SER5.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 398 [text/plain]
Saving to: вЂ�A0A6P8SER5.fastaвЂ™

     0K                                                       100% 26.4M=0s

2021-09-25 12:59:54 (26.4 MB/s) - вЂ�A0A6P8SER5.fastaвЂ™ saved [398/398]

--2021-09-25 12:59:54--  https://www.uniprot.org/uniprot/Q0VFS3.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 392 [text/plain]
Saving to: вЂ�Q0VFS3.fastaвЂ™

     0K                                                       100% 16.9M=0s

2021-09-25 12:59:55 (16.9 MB/s) - вЂ�Q0VFS3.fastaвЂ™ saved [392/392]

--2021-09-25 12:59:55--  https://www.uniprot.org/uniprot/A0A6P7YV58.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 408 [text/plain]
Saving to: вЂ�A0A6P7YV58.fastaвЂ™

     0K                                                       100% 25.9M=0s

2021-09-25 12:59:55 (25.9 MB/s) - вЂ�A0A6P7YV58.fastaвЂ™ saved [408/408]

--2021-09-25 12:59:55--  https://www.uniprot.org/uniprot/H3B3V2.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 397 [text/plain]
Saving to: вЂ�H3B3V2.fastaвЂ™

     0K                                                       100% 15.4M=0s

2021-09-25 12:59:55 (15.4 MB/s) - вЂ�H3B3V2.fastaвЂ™ saved [397/397]

--2021-09-25 12:59:56--  https://www.uniprot.org/uniprot/S4RW06.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 360 [text/plain]
Saving to: вЂ�S4RW06.fastaвЂ™

     0K                                                       100% 12.8M=0s

2021-09-25 12:59:56 (12.8 MB/s) - вЂ�S4RW06.fastaвЂ™ saved [360/360]

--2021-09-25 12:59:56--  https://www.uniprot.org/uniprot/S4RT72.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 349 [text/plain]
Saving to: вЂ�S4RT72.fastaвЂ™

     0K                                                       100% 12.8M=0s

2021-09-25 12:59:56 (12.8 MB/s) - вЂ�S4RT72.fastaвЂ™ saved [349/349]

--2021-09-25 12:59:56--  https://www.uniprot.org/uniprot/E9GQY3.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 355 [text/plain]
Saving to: вЂ�E9GQY3.fastaвЂ™

     0K                                                       100% 11.0M=0s

2021-09-25 12:59:57 (11.0 MB/s) - вЂ�E9GQY3.fastaвЂ™ saved [355/355]

--2021-09-25 12:59:57--  https://www.uniprot.org/uniprot/A0A1S3JBA9.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 414 [text/plain]
Saving to: вЂ�A0A1S3JBA9.fastaвЂ™

     0K                                                       100% 14.5M=0s

2021-09-25 12:59:57 (14.5 MB/s) - вЂ�A0A1S3JBA9.fastaвЂ™ saved [414/414]

--2021-09-25 12:59:57--  https://www.uniprot.org/uniprot/A0A182DZT7.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 440 [text/plain]
Saving to: вЂ�A0A182DZT7.fastaвЂ™

     0K                                                       100% 9.47M=0s

2021-09-25 12:59:58 (9.47 MB/s) - вЂ�A0A182DZT7.fastaвЂ™ saved [440/440]

--2021-09-25 12:59:58--  https://www.uniprot.org/uniprot/A0A1I7YP91.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 393 [text/plain]
Saving to: вЂ�A0A1I7YP91.fastaвЂ™

     0K                                                       100% 26.4M=0s

2021-09-25 12:59:58 (26.4 MB/s) - вЂ�A0A1I7YP91.fastaвЂ™ saved [393/393]

--2021-09-25 12:59:58--  https://www.uniprot.org/uniprot/A0A0M3HZG8.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 422 [text/plain]
Saving to: вЂ�A0A0M3HZG8.fastaвЂ™

     0K                                                       100% 29.0M=0s

2021-09-25 12:59:58 (29.0 MB/s) - вЂ�A0A0M3HZG8.fastaвЂ™ saved [422/422]

--2021-09-25 12:59:58--  https://www.uniprot.org/uniprot/A0A6A0GXX2.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 394 [text/plain]
Saving to: вЂ�A0A6A0GXX2.fastaвЂ™

     0K                                                       100% 24.1M=0s

2021-09-25 12:59:59 (24.1 MB/s) - вЂ�A0A6A0GXX2.fastaвЂ™ saved [394/394]

--2021-09-25 12:59:59--  https://www.uniprot.org/uniprot/A0A5B7CUE5.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 370 [text/plain]
Saving to: вЂ�A0A5B7CUE5.fastaвЂ™

     0K                                                       100% 33.0M=0s

2021-09-25 12:59:59 (33.0 MB/s) - вЂ�A0A5B7CUE5.fastaвЂ™ saved [370/370]

--2021-09-25 12:59:59--  https://www.uniprot.org/uniprot/A0A0P4WIA7.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 384 [text/plain]
Saving to: вЂ�A0A0P4WIA7.fastaвЂ™

     0K                                                       100% 12.2M=0s

2021-09-25 12:59:59 (12.2 MB/s) - вЂ�A0A0P4WIA7.fastaвЂ™ saved [384/384]

--2021-09-25 13:00:00--  https://www.uniprot.org/uniprot/A0A2C9K242.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 415 [text/plain]
Saving to: вЂ�A0A2C9K242.fastaвЂ™

     0K                                                       100% 36.6M=0s

2021-09-25 13:00:00 (36.6 MB/s) - вЂ�A0A2C9K242.fastaвЂ™ saved [415/415]

--2021-09-25 13:00:00--  https://www.uniprot.org/uniprot/A0A2T7Q1C5.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 408 [text/plain]
Saving to: вЂ�A0A2T7Q1C5.fastaвЂ™

     0K                                                       100% 14.4M=0s

2021-09-25 13:00:00 (14.4 MB/s) - вЂ�A0A2T7Q1C5.fastaвЂ™ saved [408/408]

--2021-09-25 13:00:01--  https://www.uniprot.org/uniprot/V4BCC4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 403 [text/plain]
Saving to: вЂ�V4BCC4.fastaвЂ™

     0K                                                       100% 23.3M=0s

2021-09-25 13:00:01 (23.3 MB/s) - вЂ�V4BCC4.fastaвЂ™ saved [403/403]

--2021-09-25 13:00:01--  https://www.uniprot.org/uniprot/A0A7I8W1Z3.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 395 [text/plain]
Saving to: вЂ�A0A7I8W1Z3.fastaвЂ™

     0K                                                       100% 10.9M=0s

2021-09-25 13:00:01 (10.9 MB/s) - вЂ�A0A7I8W1Z3.fastaвЂ™ saved [395/395]

--2021-09-25 13:00:01--  https://www.uniprot.org/uniprot/R7TAC4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 429 [text/plain]
Saving to: вЂ�R7TAC4.fastaвЂ™

     0K                                                       100% 14.6M=0s

2021-09-25 13:00:02 (14.6 MB/s) - вЂ�R7TAC4.fastaвЂ™ saved [429/429]

--2021-09-25 13:00:02--  https://www.uniprot.org/uniprot/A0A6J8ED40.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 380 [text/plain]
Saving to: вЂ�A0A6J8ED40.fastaвЂ™

     0K                                                       100% 14.2M=0s

2021-09-25 13:00:02 (14.2 MB/s) - вЂ�A0A6J8ED40.fastaвЂ™ saved [380/380]

--2021-09-25 13:00:02--  https://www.uniprot.org/uniprot/A0A210PHV4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 404 [text/plain]
Saving to: вЂ�A0A210PHV4.fastaвЂ™

     0K                                                       100% 15.4M=0s

2021-09-25 13:00:03 (15.4 MB/s) - вЂ�A0A210PHV4.fastaвЂ™ saved [404/404]

--2021-09-25 13:00:03--  https://www.uniprot.org/uniprot/A0A1D2MPG0.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 390 [text/plain]
Saving to: вЂ�A0A1D2MPG0.fastaвЂ™

     0K                                                       100% 25.7M=0s

2021-09-25 13:00:03 (25.7 MB/s) - вЂ�A0A1D2MPG0.fastaвЂ™ saved [390/390]

--2021-09-25 13:00:03--  https://www.uniprot.org/uniprot/A0A226DZV5.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 396 [text/plain]
Saving to: вЂ�A0A226DZV5.fastaвЂ™

     0K                                                       100% 16.4M=0s

2021-09-25 13:00:03 (16.4 MB/s) - вЂ�A0A226DZV5.fastaвЂ™ saved [396/396]

--2021-09-25 13:00:04--  https://www.uniprot.org/uniprot/A0A7M7SY80.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 396 [text/plain]
Saving to: вЂ�A0A7M7SY80.fastaвЂ™

     0K                                                       100% 26.8M=0s

2021-09-25 13:00:04 (26.8 MB/s) - вЂ�A0A7M7SY80.fastaвЂ™ saved [396/396]

--2021-09-25 13:00:04--  https://www.uniprot.org/uniprot/A0A0L8HK58.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 408 [text/plain]
Saving to: вЂ�A0A0L8HK58.fastaвЂ™

     0K                                                       100% 17.2M=0s

2021-09-25 13:00:04 (17.2 MB/s) - вЂ�A0A0L8HK58.fastaвЂ™ saved [408/408]

--2021-09-25 13:00:04--  https://www.uniprot.org/uniprot/A0A6P7SI26.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 397 [text/plain]
Saving to: вЂ�A0A6P7SI26.fastaвЂ™

     0K                                                       100% 15.6M=0s

2021-09-25 13:00:05 (15.6 MB/s) - вЂ�A0A6P7SI26.fastaвЂ™ saved [397/397]

--2021-09-25 13:00:05--  https://www.uniprot.org/uniprot/T1IQX4.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 386 [text/plain]
Saving to: вЂ�T1IQX4.fastaвЂ™

     0K                                                       100% 28.5M=0s

2021-09-25 13:00:05 (28.5 MB/s) - вЂ�T1IQX4.fastaвЂ™ saved [386/386]

--2021-09-25 13:00:05--  https://www.uniprot.org/uniprot/A0A6A4WQU7.fasta
Resolving www.uniprot.org (www.uniprot.org)... 193.62.193.81
Connecting to www.uniprot.org (www.uniprot.org)|193.62.193.81|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 399 [text/plain]
Saving to: вЂ�A0A6A4WQU7.fastaвЂ™

     0K                                                       100% 17.9M=0s

2021-09-25 13:00:05 (17.9 MB/s) - вЂ�A0A6A4WQU7.fastaвЂ™ saved [399/399]

Adjusting retrieved sequence names:

!wsl sed -i -r "/>/s/^>(sp|tr)\|(.+)\|.+$/>\2/" ./sample.fa

Alignment of sampled sequences and HMM profile building¶

The alignment has been performed with the MAFFT v7.475 program. As the domains of interest are situated quite closely to each other, in one conservative protein region, L-INS-I algorithm has been used:

In Unix cmd:
linsi sample.fa > alignment.afa

The obtained alignment then has been refined manually using the Jalview editor:
N-end has been trimmed up to the 99th position, C-end seems to be conservative, thus has remained untrimmed, sequences 25, 39, 47, 48 and 24, 31, 42, 8 have been removed as having non-conservative N- or C-terms respectively. Sequences containing internal insertions have been retained.
Tht resulting alignment is in the refined_alignment.afa file.

The HMM profile, configured for global alignment finding, has been built with hmm2build from the HMMER 2.3.2 package:

In Unix cmd:
hmm2build -g -n Ras_SOCS ras_socs.hmm refined_alignment.afa

Then it has been calibrated with default settings:

In Unix cmd:
hmm2calibrate --cpu 1 ras_socs.hmm

HMM profile testing¶

To find out the cutoff score with reasonable sensitivity to specificity ratio for the built HMM profile, the SOCS.fasta file, containing all the SOCS_box domain protein sequences of metazoan origin, has been retrieved from UniProtKB via the advanced search construction:
database:(type:pfam socs_box) taxonomy:"Metazoa [33208]"

Adjusting retrieved sequence names:

!wsl sed -i -r "/>/s/^>(sp|tr)\|(.+)\|.+$/>\2/" ./SOCS.fasta

Next, search for the domain architectures, encoded by the built HMM profile, has been undertaken in the retrieved proteins via hmm2search from the HMMER 2.3.2 package:

In Unix cmd:
hmm2search --cpu 1 -E 10 ras_socs.hmm SOCS.fasta > results.txt

Such a high E-value has been taken as it was previously shown that -E 0.1 gives an output just 2 sequences longer than the table1, which may be useless for profile testing (data not shown).

Adjusing the output to tsv format and downloading it:

!wsl sed -r -e '1,/Parsed for domains/d' -e '/Alignments of top-scoring domains/,$d' -e '/-----/d' results.txt |\
wsl sed -r  's/ +/\t/g' | wsl sed -r -e '1s/(seq-t\t)/\1coverage_seq\t/' -e '1s/(hmm-t\t)/\1coverage_hmm\t/' > results.tsv

table2 = pd.read_csv("./results.tsv", sep="\t")

Marking the true findings in the output:

table1_AC = list(table1["Entry"])

table2["in_table1"] = table2["Sequence"].apply(lambda x: x in table1_AC)

del table1_AC

Sorting table2 by score increase and calculating Sensitivity and Specificity values for each score (as if it were taken as a cutoff):

table2 = table2.sort_values(by="score", ascending=False).reset_index(drop=True)

seq_num = !wsl grep -c ">" SOCS.fasta
seq_num = int(seq_num[0])
good_seq_num = table1.shape[0]
bad_seq_num = seq_num - good_seq_num
rejected_bad_seq_num = bad_seq_num - (~table2["in_table1"]).agg(np.sum)

#Sensitivity|cutoff = TP|cutoff / (TP + FN)
table2["Sensitivity"] = [table2["in_table1"][:n].agg(np.sum) / good_seq_num for n in table2.index]

#Specificity|cutoff = TN|cutoff / (TN + FP)
table2["Specificity"] = [((~table2["in_table1"][n:]).agg(np.sum) + rejected_bad_seq_num) / bad_seq_num for n in table2.index]

with pd.ExcelWriter("tables.xlsx", engine="openpyxl", mode="a") as writer:
    table2.to_excel(writer, sheet_name="table_2", index=False)

ROC curve:

fig = plt.figure(figsize=[8, 6])

big_ax = fig.add_axes([0, 0, 1, 1])
big_ax.plot(1 - table2["Specificity"], table2["Sensitivity"])
big_ax.grid(True)
plt.xticks(size=12)
plt.yticks(size=12)
big_ax.set_xlabel("1 - specificity", size=14)
big_ax.set_ylabel("sensitivity", size=14)
plt.annotate("", xy=(0.0001, 1), xytext=(0.00035, 0.703), arrowprops=dict(arrowstyle="-"))
plt.annotate("", xy=(0.000903, 1.005), xytext=(0.000805, 0.703), arrowprops=dict(arrowstyle="-"))
plt.title("ROC curve", size=16)

small_ax = fig.add_axes([0.4, 0.2, 0.5, 0.5])
small_ax.plot(1 - table2["Specificity"], table2["Sensitivity"])
small_ax.set_xlim(0.0001, 0.0009)
small_ax.set_ylim(0.9950, 1.0005)
plt.xticks(rotation=45)
plt.yticks(np.arange(995, 1001)/1000)
small_ax.grid(True)

plt.show()

From the curve above it is evident that it takes only a negligibly small decrease in specificity to reach 100% sensitivity. So, any reasonable score cutoff, resulting in 100% sensitivity, would also have almost perfect specificity. Conversely, score cutoff with almost 100% specificity would result in >95% sensitivity. The profile is inherently both very sensitive and very specific.

Defining the score cutoff:

The first finding, absent from table1, (A0A3M0KYX4) has a distinctly high score and is followed by a number of findings with the target architecture (i.e. present in table1). According to InterPro annotation, A3M0KYX4 has Rab domain and SOCS-box domain; since Rab is a subfamily of Ras family and many proteins from table1 are annotated as somewhat like "Ras-related Rab protein", this protein may still be considered to have the target domain architecture. Pfam profile for Ras seems to be sensitive to the whole family, so it is quite unexpectedly to find the Ras domain unidentified by Pfam in this protein . The bulk of non-homologous findings starts, however, after the score falls below 58:

table2[table2["in_table1"] == False].head(10)

However, there are some target findings with score below 58. A0A6I9Y2F8 (score = 48.6) has been manually verified to have the target architecture in agreement with its placing in table1. Nevertheless, the next finding, A0A7G3AK77, (score = -54.3), though being present in table1, has inversed domain architecture (SOCS+Ras), so is not a target finding:

table2[table2["in_table1"] == True].tail(10)

There is a steep score decrease after a rather common value of score = 325. However, there are a lot of target findings below this score and sequences with the mentioned stereotypic score = 325 all seem to belong to the large class of Aves (just numerous homologues):

table2.iloc[1220:1245, :]

Score decline graph:

fig = plt.figure(figsize=[9, 7])

plt.plot(table2.index, table2["score"])
plt.plot([-25, 1300], [325, 325], linestyle="--")
plt.plot([-25, 1300], [58, 58], linestyle="--")
plt.xlim(-25, 1300)
plt.xticks(size=12)
plt.yticks(size=12)
plt.text(150, 337, "score cutoff = 325", fontsize=12, color="orange")
plt.text(150, 70, "score cutoff = 58", fontsize=12, color="green")
plt.xlabel("Finding number, sorted by score", size=14)
plt.ylabel("Score", size=14)
plt.grid(True)
plt.title("Score decline graph", size=16)


plt.show()

So there could be defined the two cutoffs: a strict one (score > 325), based on the steep step on the Score decline graph, and a relaxed one (score > 58), based on the score of the second false finding. The relaxed cutoff seems better (more sensitive, than the strict one, though equally specific), which is quite explicit from the confusion matrices below:

Confusion matrix for the strict cutoff (score cutoff = 325):

conf_matrix_strict = pd.DataFrame({"in table_1":[table2[table2["score"] > 325]["in_table1"].agg(np.sum),
                                    table2[table2["score"] <= 325]["in_table1"].agg(np.sum)],
                            "not in table_1":[(~table2[table2["score"] > 325]["in_table1"]).agg(np.sum),
                                     (~table2[table2["score"] <= 325]["in_table1"]).agg(np.sum)]},
                           index=["predicted with hmm", "not predicted with hmm"])
conf_matrix_strict

with pd.ExcelWriter("tables.xlsx", engine="openpyxl", mode="a") as writer:
    conf_matrix_strict.to_excel(writer, sheet_name="score_cutoff_325", index=True)

Confusion matrix for the relaxed cutoff (score cutoff = 58):

conf_matrix_relaxed = pd.DataFrame({"in table_1":[table2[table2["score"] > 58]["in_table1"].agg(np.sum),
                                    table2[table2["score"] <=58]["in_table1"].agg(np.sum)],
                            "not in table_1":[(~table2[table2["score"] > 58]["in_table1"]).agg(np.sum),
                                     (~table2[table2["score"] <= 58]["in_table1"]).agg(np.sum)]},
                           index=["predicted with hmm", "not predicted with hmm"])
conf_matrix_relaxed

with pd.ExcelWriter("tables.xlsx", engine="openpyxl", mode="a") as writer:
    conf_matrix_relaxed.to_excel(writer, sheet_name="score_cutoff_58", index=True)

Analysis complete

	Entry	Entry name	Protein names	Length	Cross-reference (Pfam)	Organism	Taxonomic lineage (PHYLUM)	Taxonomic lineage (CLASS)	Taxonomic lineage (FAMILY)	Taxonomic lineage (GENUS)
0	Q96S21	RB40C_HUMAN	Ras-related protein Rab-40C (Rar-like protein)...	281	PF00071;PF07525;	Homo sapiens (Human)	Chordata	Mammalia	Hominidae (great apes)	Homo
1	P0C0E4	RB40L_HUMAN	Ras-related protein Rab-40A-like (Ras-like GTP...	278	PF00071;PF07525;	Homo sapiens (Human)	Chordata	Mammalia	Hominidae (great apes)	Homo
2	Q12829	RB40B_HUMAN	Ras-related protein Rab-40B (SOCS box-containi...	278	PF00071;PF07525;	Homo sapiens (Human)	Chordata	Mammalia	Hominidae (great apes)	Homo
3	Q8WXH6	RB40A_HUMAN	Ras-related protein Rab-40A (SOCS box-containi...	277	PF00071;PF07525;	Homo sapiens (Human)	Chordata	Mammalia	Hominidae (great apes)	Homo
4	Q8VHQ4	RB40C_MOUSE	Ras-related protein Rab-40C (SOCS box-containi...	281	PF00071;PF07525;	Mus musculus (Mouse)	Chordata	Mammalia	Muridae	Mus

	Entry	Entry name	Protein names	Length	Cross-reference (Pfam)	Organism	Taxonomic lineage (PHYLUM)	Taxonomic lineage (CLASS)	Taxonomic lineage (FAMILY)	Taxonomic lineage (GENUS)
290	A0A7K9LXD8	A0A7K9LXD8_HYDTY	RB40C protein (Fragment)	281	PF00071;PF07525;	Hydrobates tethys (Wedge-rumped storm-petrel) ...	Chordata	Aves	NaN	Hydrobates
450	A0A7K9PR94	A0A7K9PR94_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Pachycephala philippinensis (yellow-belllied w...	Chordata	Aves	NaN	Pachycephala
466	A0A7K8CCV7	A0A7K8CCV7_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Rhagologus leucostigma	Chordata	Aves	NaN	Rhagologus
486	A0A7K8DFJ0	A0A7K8DFJ0_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Eulacestoma nigropectus (wattled ploughbill)	Chordata	Aves	NaN	Eulacestoma
565	A0A7K6F9I1	A0A7K6F9I1_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Daphoenositta chrysoptera (varied sittella)	Chordata	Aves	NaN	Daphoenositta
595	A0A7L3Z322	A0A7L3Z322_FREGA	RB40B protein (Fragment)	269	PF00071;PF07525;	Fregetta grallaria (White-bellied storm-petrel...	Chordata	Aves	NaN	Fregetta
732	A0A7K8HTP9	A0A7K8HTP9_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Aleadryas rufinucha (rufous-naped whistler)	Chordata	Aves	NaN	Aleadryas
774	A0A7L3YX18	A0A7L3YX18_FREGA	RB40C protein (Fragment)	281	PF00071;PF07525;	Fregetta grallaria (White-bellied storm-petrel...	Chordata	Aves	NaN	Fregetta
1084	A0A7K8U173	A0A7K8U173_OCEOC	RB40C protein (Fragment)	281	PF00071;PF07525;	Oceanites oceanicus (Wilson's storm petrel) (P...	Chordata	Aves	NaN	Oceanites
1202	A0A7K6LAI0	A0A7K6LAI0_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Falcunculus frontatus (Eastern shriketit)	Chordata	Aves	NaN	Falcunculus
1260	A0A7K5VCA7	A0A7K5VCA7_9CORV	RB40C protein (Fragment)	281	PF00071;PF07525;	Platysteira castanea	Chordata	Aves	NaN	Dyaphorophyia

	Entry	Entry name	Protein names	Length	Cross-reference (Pfam)	Organism	Taxonomic lineage (PHYLUM)	Taxonomic lineage (CLASS)	Taxonomic lineage (FAMILY)	Taxonomic lineage (GENUS)
25	K7G1B8	K7G1B8_PELSI	SOCS box domain-containing protein	278	PF00071;PF07525;	Pelodiscus sinensis (Chinese softshell turtle)...	Chordata	NaN	Trionychidae (soft-shelled turtles)	Pelodiscus
42	K7FIM4	K7FIM4_PELSI	SOCS box domain-containing protein	281	PF00071;PF07525;	Pelodiscus sinensis (Chinese softshell turtle)...	Chordata	NaN	Trionychidae (soft-shelled turtles)	Pelodiscus
45	K7FIM8	K7FIM8_PELSI	SOCS box domain-containing protein	219	PF00071;PF07525;	Pelodiscus sinensis (Chinese softshell turtle)...	Chordata	NaN	Trionychidae (soft-shelled turtles)	Pelodiscus
132	A0A452IZK9	A0A452IZK9_9SAUR	RAB40B, member RAS oncogene family	278	PF00071;PF07525;	Gopherus agassizii (Agassiz's desert tortoise)	Chordata	NaN	Testudinidae (tortoises)	Gopherus
308	A0A4D9E0G6	A0A4D9E0G6_9SAUR	Ras-related protein Rab-40C	285	PF00071;PF07525;	Platysternon megacephalum (big-headed turtle)	Chordata	NaN	Platysternidae	Platysternon
404	A0A674K2A8	A0A674K2A8_TERCA	RAB40B, member RAS oncogene family	278	PF00071;PF07525;	Terrapene carolina triunguis (Three-toed box t...	Chordata	NaN	Emydidae (terrapins)	Terrapene
433	A0A674I0V9	A0A674I0V9_TERCA	RAB40C, member RAS oncogene family	275	PF00071;PF07525;	Terrapene carolina triunguis (Three-toed box t...	Chordata	NaN	Emydidae (terrapins)	Terrapene
476	A0A4D9E7H5	A0A4D9E7H5_9SAUR	Inactive serine/threonine-protein kinase VRK3	278	PF00071;PF07525;	Platysternon megacephalum (big-headed turtle)	Chordata	NaN	Platysternidae	Platysternon
488	A0A7M4E1H1	A0A7M4E1H1_CROPO	RAB40C, member RAS oncogene family	285	PF00071;PF07525;	Crocodylus porosus (Saltwater crocodile) (Estu...	Chordata	NaN	Crocodylidae (crocodilians)	Crocodylus
489	A0A7M4FVE7	A0A7M4FVE7_CROPO	RAB40B, member RAS oncogene family	236	PF00071;PF07525;	Crocodylus porosus (Saltwater crocodile) (Estu...	Chordata	NaN	Crocodylidae (crocodilians)	Crocodylus
550	M7BU19	M7BU19_CHEMY	Ras-related protein Rab-40C	537	PF00071;PF07525;	Chelonia mydas (Green sea-turtle) (Chelonia ag...	Chordata	NaN	Cheloniidae (sea turtles)	Chelonia
762	A0A151N751	A0A151N751_ALLMI	Ras-related protein Rab-40C	281	PF00071;PF07525;	Alligator mississippiensis (American alligator)	Chordata	NaN	Alligatoridae	Alligator
769	A0A151NVA0	A0A151NVA0_ALLMI	Ras-related protein Rab-40B	277	PF00071;PF07525;	Alligator mississippiensis (American alligator)	Chordata	NaN	Alligatoridae	Alligator
1106	A0A1U7S645	A0A1U7S645_ALLSI	ras-related protein Rab-40C	285	PF00071;PF07525;	Alligator sinensis (Chinese alligator)	Chordata	NaN	Alligatoridae	Alligator
1109	A0A3Q0H1P6	A0A3Q0H1P6_ALLSI	ras-related protein Rab-40B	315	PF00071;PF07525;	Alligator sinensis (Chinese alligator)	Chordata	NaN	Alligatoridae	Alligator
1201	A0A452IP11	A0A452IP11_9SAUR	RAB40C, member RAS oncogene family	273	PF00071;PF07525;	Gopherus agassizii (Agassiz's desert tortoise)	Chordata	NaN	Testudinidae (tortoises)	Gopherus
1216	A0A452IPK6	A0A452IPK6_9SAUR	RAB40C, member RAS oncogene family	285	PF00071;PF07525;	Gopherus agassizii (Agassiz's desert tortoise)	Chordata	NaN	Testudinidae (tortoises)	Gopherus

	Sequence	Domain	seq-f	seq-t	coverage_seq	hmm-f	hmm-t	coverage_hmm	score	E-value	in_table1	Sensitivity	Specificity
1208	A0A3M0KYX4	1/1	13	334	.]	1	277	[]	332.9	1.300000e-96	False	0.949686	1.000000
1269	A0A7J8F2Z4	1/1	1	132	[]	1	277	[]	57.6	1.000000e-13	False	0.996855	0.999952
1270	A0A6J2Q4P7	1/1	1	130	[]	1	277	[]	57.4	1.200000e-13	False	0.996855	0.999904
1271	G3HBT5	1/1	1	132	[]	1	277	[]	56.7	1.900000e-13	False	0.996855	0.999857
1273	A0A0S7KI55	1/1	1	137	[.	1	277	[]	36.9	1.300000e-07	False	0.997642	0.999809
1274	A0A161M281	1/1	1	141	[.	1	277	[]	21.2	1.400000e-06	False	0.997642	0.999761
1275	A0A151WU20	1/1	15	157	.]	1	277	[]	17.3	2.500000e-06	False	0.997642	0.999713
1276	A0A026X0B4	1/1	15	157	.]	1	277	[]	16.7	2.800000e-06	False	0.997642	0.999665
1277	G9KJU2	1/1	1	118	[]	1	277	[]	-13.8	2.600000e-04	False	0.997642	0.999618
1278	A0A084VQ85	1/1	1	119	[]	1	277	[]	-38.9	1.100000e-02	False	0.997642	0.999570

	Sequence	Domain	seq-f	seq-t	coverage_seq	hmm-f	hmm-t	coverage_hmm	score	E-value	in_table1	Sensitivity	Specificity
1263	A0A1A8QE17	1/1	1	178	[]	1	277	[]	196.4	1.700000e-55	True	0.992138	0.999952
1264	A0A6F9CNY3	1/1	1	173	[]	1	277	[]	171.8	4.300000e-48	True	0.992925	0.999952
1265	A0A2F0B2A4	1/1	1	165	[]	1	277	[]	158.5	4.200000e-44	True	0.993711	0.999952
1266	A0A212D9G6	1/1	1	164	[]	1	277	[]	157.8	6.800000e-44	True	0.994497	0.999952
1267	T1EH04	1/1	1	140	[]	1	277	[]	155.8	2.900000e-43	True	0.995283	0.999952
1268	A0A7L3NTX1	1/1	1	147	[]	1	277	[]	134.1	9.600000e-37	True	0.996069	0.999952
1272	A0A6I9Y2F8	1/1	1	151	[]	1	277	[]	48.6	5.100000e-11	True	0.996855	0.999809
1279	A0A7G3AK77	1/1	824	1082	.]	1	277	[]	-54.3	1.100000e-01	True	0.997642	0.999522
1280	A0A1B0CEK5	1/1	819	1077	.]	1	277	[]	-54.3	1.100000e-01	True	0.998428	0.999522
1281	A0A7G3AIX4	1/1	824	1082	.]	1	277	[]	-54.3	1.100000e-01	True	0.999214	0.999522

	Sequence	Domain	seq-f	seq-t	coverage_seq	hmm-f	hmm-t	coverage_hmm	score	E-value	in_table1	Sensitivity	Specificity
1220	A0A7L3I4C8	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.958333	0.999952
1221	A0A7K7HUQ1	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.959119	0.999952
1222	A0A7L2XGY4	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.959906	0.999952
1223	A0A7L4DVJ4	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.960692	0.999952
1224	A0A7K5YSE0	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.961478	0.999952
1225	A0A7K7DDB7	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.962264	0.999952
1226	A0A7L0X7Z0	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.963050	0.999952
1227	A0A7K7FFT3	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.963836	0.999952
1228	A0A7L3ADR1	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.964623	0.999952
1229	A0A7K7MDG2	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.965409	0.999952
1230	A0A7L4KDS1	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.966195	0.999952
1231	A0A7K7EJE2	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.966981	0.999952
1232	R0JDU6	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.967767	0.999952
1233	A0A7L0WC06	1/1	1	194	[]	1	277	[]	325.7	2.000000e-94	True	0.968553	0.999952
1234	A0A6J0YP95	1/1	34	272	.]	1	277	[]	325.4	2.400000e-94	True	0.969340	0.999952
1235	A0A6G1AZS0	1/1	1	194	[]	1	277	[]	324.7	4.100000e-94	True	0.970126	0.999952
1236	G8F5M2	1/1	1	194	[]	1	277	[]	322.1	2.500000e-93	True	0.970912	0.999952
1237	A0A2P4S8U1	1/1	1	193	[]	1	277	[]	320.3	8.600000e-93	True	0.971698	0.999952
1238	A0A0P7UWS7	1/1	1	190	[]	1	277	[]	310.6	7.000000e-90	True	0.972484	0.999952
1239	A0A5G2Q6P1	1/1	6	260	.]	1	277	[]	310.5	7.500000e-90	True	0.973270	0.999952
1240	A0A341CPD7	1/1	1	227	[]	1	277	[]	307.3	6.700000e-89	True	0.974057	0.999952
1241	F1S008	1/1	40	290	.]	1	277	[]	306.1	1.600000e-88	True	0.974843	0.999952
1242	A0A2U4C4H4	1/1	1	227	[]	1	277	[]	304.2	5.900000e-88	True	0.975629	0.999952
1243	A0A7L3JEP5	1/1	1	181	[]	1	277	[]	296.6	1.100000e-85	True	0.976415	0.999952
1244	A0A7L2IP12	1/1	1	181	[]	1	277	[]	296.1	1.600000e-85	True	0.977201	0.999952